Text segmentation based on term repetition distances
نویسندگان
چکیده
منابع مشابه
Text Segmentation based on Semantic Word Embeddings
We explore the use of semantic word embeddings [14, 16, 12] in text segmentation algorithms, including the C99 segmentation algorithm [3, 4] and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iter...
متن کاملText Segmentation Based on Similarity between Words
This paper proposes a new indicator of text structure, called the lexical cohesion pro le (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a s...
متن کاملText Segmentation into Paragraphs Based on Local Text Cohesion
The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Speci...
متن کاملText Classification of Technical Papers Based on Text Segmentation
The goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper. Based on the idea that papers are well organized and some parts of papers are more important than others for text classification, segments such as title, abstract, introduction and conclusion are intensively used in text representation. In addition, new feat...
متن کاملHistogram Based Segmentation Using Wasserstein Distances
We developed variational models for image segmentation that incorporate histogram information into level set based curve evolution techniques. The novelty is in the use of Wasserstein mass transfer metrics in order to compare histograms; we found that this improves the results significantly over previous Left: Segmentation based on average intensities via ChanVese model. Right: Proposed model b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 2006
ISSN: 1340-7619,2185-8314
DOI: 10.5715/jnlp.13.2_3